Persian Printed Document Analysis and Page Segmentation

نویسندگان

  • Jamshid Shanbehzadeh tarbiat moalem
چکیده مقاله:

This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifying them as texts, images, and tables/drawings. The proposed method was experiment with the Persian documents. The result of these tests have shown that the proposed method provide more accurate and speed results.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

persian printed document analysis and page segmentation

this paper presents, a hybrid method, low-resolution and high-resolution, for persian page segmentation. in the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. by high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifying...

متن کامل

Skew Detection, Page Segmentation, and Script Classiication of Printed Document Images

Automatic processing of international documents presents a number of challenging problems because Optical Character Recognition (OCR) techniques are not available for all languages and all script classes. Document images must be categorized according to their script type rst, in our case Roman, Ideographic, or Arabic. We present a set of statistical methods that rst detect and correct the skew ...

متن کامل

Ground-truthing and benchmarking document page segmentation

We describe a new approach for evaluating page segmentation algorithms. Unlike techniques that rely on OCR output, our method is region-based: the segmentation output, described as a set of regions together with their types, output order etc., is matched against the pre-stored set of ground-truth regions. Misclassifications, splitting, and merging of regions are among the errors that are detect...

متن کامل

Line and Ligature Segmentation in Printed Urdu Document Images

This paper presents a technique for segmentation of printed Urdu text images into lines and ligatures, a key pre-processing step in Urdu Optical Character Recognition (OCR) systems. Unlike classical projection profile based line segmentation methods, the proposed scheme successfully segments overlapping and touching lines. Once the lines are segmented, ligatures are extracted from each text lin...

متن کامل

Persian/Arabic Document Segmentation Based On Pyramidal Image Structure

Automatic transformation of paper documents into electronic documents requires document segmentation at the first stage. However, some parameters restrictions such as variations in character font sizes, different text line spacing, and also not uniform document layout structures altogether have made it difficult to design a general-purpose document layout analysis algorithm for many years. Thus...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 1  شماره 1

صفحات  -

تاریخ انتشار 2010-02-25

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023